SamBaTen: Sampling-based Batch Incremental Tensor Decomposition
نویسندگان
چکیده
Tensor decompositions are invaluable tools in analyzing multimodal datasets. In many real-world scenarios, such datasets are far from being static, to the contrary they tend to grow over time. For instance, in an online social network setting, as we observe new interactions over time, our dataset gets updated in its “time” mode. How can we maintain a valid and accurate tensor decomposition of such a dynamically evolving multimodal dataset, without having to re-compute the entire decomposition after every single update? In this paper we introduce SAMBATEN, a Sampling-based Batch Incremental Tensor Decomposition algorithm, which incrementally maintains the decomposition given new updates to the tensor dataset. SAMBATEN is able to scale to datasets that the state-of-theart in incremental tensor decomposition is unable to operate on, due to its ability to effectively summarize the existing tensor and the incoming updates, and perform all computations in the reduced summary space. We extensively evaluate SAMBATEN using synthetic and real datasets. Indicatively, SAMBATEN achieves comparable accuracy to state-of-the-art incremental and non-incremental techniques, while being 2530 times faster. Furthermore, SAMBATEN scales to very large sparse and dense dynamically evolving tensors of dimensions up to 100K × 100K × 100K where state-of-the-art incremental approaches were not able to operate.
منابع مشابه
Scalable Bayesian Non-negative Tensor Factorization for Massive Count Data
We present a Bayesian non-negative tensor factorization model for count-valued tensor data, and develop scalable inference algorithms (both batch and online) for dealing with massive tensors. Our generative model can handle overdispersed counts as well as infer the rank of the decomposition. Moreover, leveraging a reparameterization of the Poisson distribution as a multinomial facilitates conju...
متن کاملTensor Train decomposition on TensorFlow (T3F)
Tensor Train decomposition is used across many branches of machine learning, but until now it lacked an implementation with GPU support, batch processing, automatic differentiation, and versatile functionality for Riemannian optimization framework, which takes in account the underlying manifold structure in order to construct efficient optimization methods. In this work, we propose a library th...
متن کاملKronecker Determinantal Point Processes
Determinantal Point Processes (DPPs) are probabilistic models over all subsets a ground set of N items. They have recently gained prominence in several applications that rely on “diverse” subsets. However, their applicability to large problems is still limited due to theO(N) complexity of core tasks such as sampling and learning. We enable efficient sampling and learning for DPPs by introducing...
متن کاملCharacterization of Deterministic and Probabilistic Sampling Patterns for Finite Completability of Low Tensor-Train Rank Tensor
In this paper, we analyze the fundamental conditions for low-rank tensor completion given the separation or tensor-train (TT) rank, i.e., ranks of unfoldings. We exploit the algebraic structure of the TT decomposition to obtain the deterministic necessary and sufficient conditions on the locations of the samples to ensure finite completability. Specifically, we propose an algebraic geometric an...
متن کاملSublinear Time Orthogonal Tensor Decomposition
A recent work (Wang et. al., NIPS 2015) gives the fastest known algorithms for orthogonal tensor decomposition with provable guarantees. Their algorithm is based on computing sketches of the input tensor, which requires reading the entire input. We show in a number of cases one can achieve the same theoretical guarantees in sublinear time, i.e., even without reading most of the input tensor. In...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1709.00668 شماره
صفحات -
تاریخ انتشار 2017